

# Electrical Engineering & Computer Science

University of Missouri

ALU

# ECE 4270/7270: Computer Organization, Spring 2023 LAB 3: Pipelined RISC-V Simulator Supporting Data Hazard Detection and Data Forwarding

# Scope

In this lab assignment, you will extend the instruction-level RISC-V simulator that you have developed in Lab 1. The RISC-V simulator that you have developed in Lab 1 was unpipelined, namely, it could take one instruction at a time and finish the execution within a (long) single cycle. In the pipelined version, there are five stages where each stage takes one cycle (where the cycle time is about ~5x smaller than the unpipelined version). As you may recall, these five pipeline stages are:

- IF: Instruction fetch
- ID: Instruction decode and register file read
- EX: Execution or address calculation
- MEM: Data memory access
- WB: Write back

You will work on ALU and Load/Store instructions as a starting point, and exclude branch and jump instructions from the scope of this lab assignment. You should build your pipelined instruction-level RISC-V simulator assuming you do not have any branches and jumps in your code (we exclude them for now. However, if you finish your lab earlier, you are welcome to add support for branches and jumps). The operations performed within each of these five stages are explained below.

#### 1. Instruction Fetch (IF):

IR <= Mem[PC] PC <= PC + 4

The instruction is fetched from memory into the instruction register (IR) by using the current program counter (PC). The PC is then incremented by 4 to address the next instruction. IR is used to hold the instruction (that is 32-bit) that will be needed in subsequent cycle during the instruction decode stage.

#### 2. Instruction Decode (ID):

A <= REGS[rs] B <= REGS[rt] ALUOut <= PC + immediate

In this stage, the instruction is decoded (i.e., opcode and operands are extracted), and the content of the register file is read (rs and rt are the register specifiers that indicate which registers to read from). The values read from register file are placed into two temporary registers called A and B. The values stored in A and B will be used in upcoming

cycles by other stages (e.g., EX, or MEM). This step also computes the branch target address and stores the address in ALUOut, where it will be used on the next clock cycle if the instruction is a branch.

#### 3. Execution (EX)

In this stage, we have an ALU that operates on the operands that were read in the previous stage. We can perform one of four functions depending on the instruction type.

#### i) Memory Reference (immediate):

The ALU is adding the operands to form the memory address.

#### ii) Register-register Operation ALUOut <= A op B

ALU performs the operation specified by the instruction on the values stored in temporary registers A and B and places the result into ALUOut.

#### 4. Memory Access (MEM):

for load: LMD <= MEM[ALUOut] for store: MEM[ALUOut] <= B

If the instruction is load, data is read from memory and stored in load memory data (LMD) register. If the instruction is store, then the value stored in register B is written into memory. The address of memory to be accessed is the value computed in the previous stage and stored in ALUOut register.

#### REGS[rd] <= ALUOut

Place the contents of ALUOut, which corresponds to the output of the ALU operation in the previous cycle, into the Result register.

#### 5. Writeback (WB)

for load instruction: REGS[rt] <= LMD

In this stage, we write the result back into the destination register in register file. The result comes from LMD.

From the figure below, notice that the temporary registers that we used (i.e. IR, A, B, imm, ALUOut, LMD) will be overwritten by the next instruction during the next cycle. Since these values have to be passed from one pipeline stage to next, we utilize pipeline registers to do so. The pipeline registers carry both data and control signals from one pipeline stage to another. Any value needed on a later pipeline stage should be placed in pipeline register and forwarded from one stage to another until it is no longer needed (i.e., not used in the upcoming stages). Below is the pipelined datapath that you are going to build. For now, you can omit the branch related links and mux, and assume that PC is incremented by 4 at every cycle.



As you can see, there are 4 pipeline registers and each of them has fields associated with proper temporary registers that we mentioned above.

The pipeline register called IF/ID has the following fields:

IF/ID.IR IF/ID.PC

On the other hand, pipeline register called ID/EX has the following fields:

ID/EX.IR

ID/EX.A

ID/EX.B

ID/EX.ALUOut

Similarly, pipeline register called EX/MEM has the following fields:

EX/MEM.IR

EX/MEM.A

EX/MEM.B

EX/MEM.ALUOut

And finally, pipeline register called MEM/WB has the following fields:

MEM/WB.IR MEM/WB.LMD

The events on pipeline stages and which values are forwarded from one stage to another through the pipeline registers are summarized below.

| Stage | Operation                                                                                                                          |                                      |
|-------|------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|
| IF    | IF/ID.IR <= MEM [PC]<br>IF/ID.PC <= PC+4                                                                                           |                                      |
|       |                                                                                                                                    |                                      |
| ID    | ID/EX.IR <= IF/ID.IR<br>ID/EX.A <= REGS[ IF/ID.IR[rs] ]<br>ID/EX.B <= REGS[ IF/ID.IR[rt] ]<br>ID/EX.ALUOut <= IF/ID.PC + immediate |                                      |
|       |                                                                                                                                    |                                      |
|       |                                                                                                                                    |                                      |
|       | ID/EX.ALUOut <= IF/ID.PC + immediate                                                                                               |                                      |
|       | ALU Instruction                                                                                                                    | Load/Store Instruction               |
| EX    | EX/MEM.IR <= ID/EX.IR                                                                                                              | EX/MEM.IR <= ID/EX.IR                |
|       | EX/MEM.ALUOut <= ID/EX.A op ID/EX.B                                                                                                | EX/MEM.ALUOut <= ID/EX.A + immediate |
|       |                                                                                                                                    |                                      |
| MEM   | MEM/WB.IR <= EX/MEM.IR                                                                                                             | MEM/WB.IR <= EX/MEM.IR               |
|       | REGS[ MEM/WB.IR[rt] ] <= EX/MEM.ALUOut                                                                                             | MEM/WB.LMD <=                        |
|       |                                                                                                                                    | MEM[ EX/MEM.ALUOut ] (for load) or   |
|       |                                                                                                                                    | MEM[ EX/MEM.ALUOut ] <= EX/MEM.B     |
|       |                                                                                                                                    | (for store)                          |
| WB    | DECSI MEM/WD IDI#11 /= MEM/WD I MD                                                                                                 |                                      |
| WB    | REGS[ MEM/WB.IR[rt] ] <= MEM/WB.LMD  (for load)                                                                                    |                                      |
|       | (101 10ad)                                                                                                                         |                                      |
|       |                                                                                                                                    |                                      |

## **Implementation**

You are given a skeleton of the pipelined RISC-V simulator (you can download it through the canvas), similar to one provided in Lab 1. However, instead of handle\_instruction(), you now have handle\_pipeline() function. This function is called every cycle. You will have more than one instruction on the fly (on different pipeline stages) in a given cycle. Within handle\_pipeline() function, there are five functions corresponding to each pipeline stage. handle\_pipeline() function looks like as follows:

```
void handle_instruction(){
     WB();
     MEM();
     EX();
     ID();
     IF();
}
```

As their names suggest WB(), MEM(), EX(), ID() and IF() functions should implement what is needed for writeback, memory access, execution, instruction decode, and instruction fetch stage of the pipeline, respectively. You are responsible to implement these functions. Note that, each stage should perform what it has to perform, nothing more, nothing less. While you implement these functions, you need to maintain the pipeline registers that we mentioned above. For this purpose, you are given CPU\_Pipeline\_Reg data structure in mu-RISC-V.h file. There are four instances created already for the pipeline registers and named as ID\_IF, IF\_EX, EX\_MEM, MEM\_WB (see mu-RISC-V.h for details).

These functions appear in reverse order within the code because you should use the outcome of particular stage as input to next stage in the next cycle, not in the current cycle. As an example, if we had IF() first, then ID() what would happen is the following. IF() would fetch new instruction and update its pipeline

register IF/ID. Then, ID() function would try to read the content of the IF/ID fields which have been updated very recently (within the same cycle).

Notice that this is not the intended behavior; ID() function should be able to read IF/ID fields that were updated in the previous cycle. To maintain this correct timing of consumer/producer relationship, the functions for pipeline stages appear on reverse order in the code. To elaborate on that more, for the first 4 cycles WB() function should do nothing since no instruction have reached that stage until the fifth cycle. Similarly, MEM() function should do nothing for the very first 3 cycles since no instruction have reached MEM stage until the fourth cycle. Similarly, EX stage should do nothing for the very first 2 cycles, and ID stage it for the very first cycle. However, if you change the order of these functions, you can see that all stages become active in the very first cycle: IF() would fill ID/IF that would be consumed by ID(), then ID() would fill ID/EX that would be consumed by EX(), and EX() would fill EX/MEM that would be consumed by MEM(), and MEM() would fill MEM/WB that would be consumed by WB(). Notice that all these stages would become active in the very first cycle. This is not correct behavior. The output of a particular pipeline stage should be consumed by the following pipeline stage in the next cycle. In short, do not change the order of functions that appear in handle\_pipeline(); this order will make your coding a lot easier.

You should also implement show\_pipeline() function that print out the content of pipeline registers for a given cycle. It will be used for debugging purposes. The outcome of this function should be similar to the following.

Current PC: value

IF/ID.IR value instruction( e.g. add \$1, \$2, \$3) IF/ID.PC

value //notice that it contains the next PC

ID/EX.IR value instruction

ID/EX.A value ID/EX.B value ID/EX.ALUOut value

EX/MEM.IR value EX/MEM.A value EX/MEM.B value EX/MEM.ALUOut value

MEM/WB.IR value MEM/WB.ALUOut value MEM/WB.LMD value

Along with pipeline registers, you may still need to maintain CPU\_State (i.e. CURRENT\_STATE, NEXT\_STATE) since CURRENT\_STATE has the REGS field that contains the values stored in the registers (you need to read them in ID stage). Similarly, NEXT\_STATE has REGS field where you should store the result of the instruction back into the corresponding register (in WB stage).

Once again, in the scope of this lab assignment, you do not work with branch and jump instructions. So the test files provided do not contain branch and jump instructions.

#### Part II

In the second part of this lab, you will extend the pipelined RISC-V simulator that you have developed in Part I to handle possible data hazards. In the lectures, we discussed two ways to handle data hazards: i) by introducing pipeline stalls; ii) by data forwarding.

#### 1. Data Hazard Detection and Introducing Pipeline Stalls:

Notice that the pipeline registers keep track of source and destination registers (i.e., registers to be read/written). So, to detect a data hazard, we need to compare the source and destination registers of instructions that we consider for data dependence. We know the source and destination registers of an instruction in ID stage, so we have to test the following conditions:

If you recall from our discussion in class, we had conditions to detect data hazards:

```
if (EX/MEM.RegWrite and (EX/MEM.RegisterRd \neq 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) if (EX/MEM.RegWrite and (EX/MEM.RegisterRd \neq 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) if (MEM/WB.RegWrite and (MEM/WB.RegisterRd \neq 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) if (MEM/WB.RegWrite and (MEM/WB.RegisterRd \neq 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
```

where RegisterRd, RegisterRs and RegisterRt represent the register number of rd, rs and rt fields of the instructions, respectively.

Namely, we can detect a data hazard for an instruction that is in the ID stage, if there is an instruction in EX or MEM stage where the destination register is one of the source registers of the instruction in ID stage.

If the instruction in the ID stage is stalled, then the instruction in the IF stage must also be stalled; otherwise, we lose the fetched instruction. Preventing these two instructions from making progress through the pipeline is accomplished simply by preventing the PC and the IF/ID pipeline register from changing. Provided these registers are preserved, the instruction in the IF stage will continue to be read using the same PC, and the registers in the ID stage will continue to be read using the same instruction fields in the IF/ID pipeline register.

On the other hand, we can insert pipeline bubbles (i.e., stalls) by setting all control signals to 0 in ID/EX pipeline register. During the consecutive cycles, these signals will be forwarded to EX, MEM, and WB stages through pipeline registers and will create a nop instruction (basically there will be no operation performed, but only control signals will be forwarded).

Once the producer (i.e., source) instruction finishes WB stage, the stalled instructions (i.e., the ones in ID and IF stages) can resume and make progress through the pipeline. Notice that the instruction in ID stage needs to read the source register whose value has just updated by the former instruction in WB stage. The control signals should be repopulated accordingly and should be forwarded to the next pipeline stage. The instruction in IF stage, on the other hand, should move into the ID stage, and the PC should be updated accordingly, such that we can fetch new instruction in the next cycle.

In this section of the lab, you are asked to implement data hazard detection and introduce pipeline stalls to avoid these hazards.

#### 2. Data Forwarding to Reduce the Pipeline Stalls

So far, we have not considered any forwarding paths in the pipeline. Thus, the data hazards would not be resolved until the producer (i.e., source) instruction reaches WB stage. However, as we have discussed in the classroom, the value that causes the dependence may become available earlier in the pipeline (either at the end of EX stage, or at the end of MEM stage), so the value can be forwarded to the EX stage for an instruction that is waiting for that particular value. Forwarding a value before the WB stage effectively reduces the stalls needed to avoid data hazards.

In this section of the lab, you are asked to implement data forwarding on top data hazard detection. Remember that we had the following forwarding conditions:

```
i.
          To forward from EX stage:
if (EX/MEM.RegWrite and (EX/MEM.RegisterRd \neq 0) and
                                               (EX/MEM.RegisterRd = ID/EX.RegisterRs)) {
      ForwardA = 10
if (EX/MEM.RegWrite and (EX/MEM.RegisterRd \neq 0) and
                                               (EX/MEM.RegisterRd = ID/EX.RegisterRt)){
      ForwardB = 10
   ii.
          To forward from MEM stage:
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd \neq 0) and not
                                        (EX/MEM.RegWrite and (EX/MEM.RegisterRd \neq 0) and
                                        (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and
                                        (MEM/WB.RegisterRd = ID/EX.RegisterRs)){
      ForwardA = 01
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd \neq 0) and not
                                        (EX/MEM.RegWrite and (EX/MEM.RegisterRd \neq 0) and
                                        (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and
                                        (MEM/WB.RegisterRd = ID/EX.RegisterRt)){
      ForwardB = 01
}
```

Notice that, above forwarding conditions already takes care of double data hazards.

#### 3. Details

You can exclude branch and jump instructions from the scope of this lab assignment.

Since you will implement two versions of the pipeline, namely:

- i) Pipeline that detects data hazards, but cannot forward data
- ii) Pipeline that detects data hazards and can forward data

you need to define a global variable called ENABLE\_FORWARDING and it can be set to 0 or 1 through a simulator command called "forwarding". You can add a case statement in the handle\_command() function, similar to the following:

Above case statement would allow you to set/reset ENABLE\_FORWARDING variable that indicates whether you need to run the simulator with data forwarding capability or not. Default value of ENABLE\_FORWARDING should be 0 (i.e., FALSE). In this case, simulator should simulate a pipeline that detects data hazards and introduce stalls to avoid them, but should not perform any data forwarding. On the other hand, when ENABLE\_FORWARDING is set to 1 (i.e., TRUE) through running "forwarding 1" command in the simulator, simulator should simulate a pipeline that detects data hazards and uses data forwarding to avoid the data hazards.

ENABLE\_FORWARDING = FALSE; //default value, no data forwarding ENABLE\_FORWARDING = TRUE; // will be set to TRUE when "forwarding 1" command is used data forwarding should be in place when data hazard is detected.

You need to add the control signals required and maintain them through the pipeline stages via pipeline registers.

You are given a test input file in Canvas that contains a set of instructions. You should use it to test your implementation

# **Grading Rubric**

**Code:** Pipelined RISC-V simulator (40)

Code: Pipelined RISC-V simulator that handles data hazards (40)

**Report:** 20 points

### Pipelined RISC-V simulator (40 points):

To get a full credit for the code, your pipelined RISC-V simulator should work correctly (i.e., the instructions should be fetched, decoded, executed correctly. The data being forwarded among pipeline registers should be correct. Your code will be debugged step by step to see if the transitions between pipeline stages are handled correctly).

#### <u>Pipelined RISC-V simulator that handles data hazards (40 points):</u>

In order to get full credit for the code, your pipelined RISC-V simulator that handles data hazards should work correctly (i.e., it should introduce pipeline stalls to avoid data hazards if there are data dependencies, and forwarding should work correctly when it is enabled).

#### <u>Lab report (20 points):</u>

Your report should give details about the work distribution within the group (who did what), milestones in your work and your implementation decisions (why did you choose the way you did it, and/or how did you do that). Explain any difficulty you observed, any optimizations you have made.

#### **Submission**

Please pay attention to this guideline here: You should submit the lab report along with the pipelined RISC-V simulator code you developed (provide makefile, as well). Please, generate a pdf file for your report and name it lab3\_report\_groupX.pdf, then place it into the folder called lab3\_groupX (where X is your group number). The folder lab3\_groupX should also contain the src/ folder that contains the code for pipelined RISC-V simulator (i.e. mu-RISC-V.h, mu-RISC-V.c and Makefile). Then, please compress the lab3\_groupX folder as lab3\_groupX.tar.gz and submit it through Canvas.

#### **Due Date**

Your lab is due on: 14/4/2023